Skip to content

feat(pipeline): allow syncing blocks ontop of the proposed chain#21025

Open
Maddiaa0 wants to merge 2 commits intonextfrom
md/pipelining-syncing
Open

feat(pipeline): allow syncing blocks ontop of the proposed chain#21025
Maddiaa0 wants to merge 2 commits intonextfrom
md/pipelining-syncing

Conversation

@Maddiaa0
Copy link
Member

@Maddiaa0 Maddiaa0 commented Mar 3, 2026

Overview

Key contributions:

  • In the pr above feat(pipeline): introduce pipeline views for building #21026 publishing was a blocking action, in this pr we move publishing to be a non blocking option, there a publisher can schedule when it should start trying to publish a block.
  • This keeps track of valid checkpoints that are pending and not settled to L1 - and allows building ontop of them.

Adds a second p2p callback that separates what runs for all nodes / validator nodes

Testing

epochs_mbps.pipeline now expects 3 blocks per checkpoint, just like the original epochs_mbps test, now it is fully pipelined.

Upcoming

  • updating the timetable to allow for longer time building in the slot - this pr does not extend the time allocated to block building.
  • handing rollbacks when the pendingCheckpoint needs to be rolled back / cleared.

@Maddiaa0 Maddiaa0 force-pushed the md/pipelining-syncing branch from 8de6157 to 31f941d Compare March 3, 2026 20:51
@Maddiaa0 Maddiaa0 force-pushed the md/update-epoch-cache-for-buildahead branch from 3a75fdb to 1ca98d8 Compare March 3, 2026 20:51
Comment on lines +74 to +76
// Atomically set the pending checkpoint number alongside the block if provided
pendingCheckpointNumber !== undefined &&
this.store.blockStore.setPendingCheckpointNumber(pendingCheckpointNumber),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't match the comment that this Sets the pending checkpoint number (quorum-attested but not yet L1-confirmed), right?

That said, let's discuss the model. Weirdly, I like the concept of having an uncheckpointed checkpoint. But it seems like we have two different things:

  • A checkpoint-being-built, which is the checkpoint being built via proposed blocks. This is just a checkpoint number, since a Checkpoint object needs all its blocks to be ready. Today we already have this, but don't need to explicitly store it.
  • A checkpoint-being-proposed, which is the Checkpoint object for which the current proposer has sent a checkpoint proposal, and would ultimately make it onto L1.

We need to define which ones we expose to clients of the archiver, and also to users via APIs like getL2Tips.

@Maddiaa0 Maddiaa0 force-pushed the md/pipelining-syncing branch 2 times, most recently from 6a16e98 to 28a0520 Compare March 6, 2026 17:19
@Maddiaa0 Maddiaa0 force-pushed the md/pipelining-syncing branch 2 times, most recently from 27f8a4b to f5c6308 Compare March 9, 2026 12:12
@Maddiaa0 Maddiaa0 force-pushed the md/update-epoch-cache-for-buildahead branch from b1b1b14 to ce0e422 Compare March 9, 2026 12:12
@Maddiaa0 Maddiaa0 force-pushed the md/pipelining-syncing branch from f5c6308 to ae6b651 Compare March 9, 2026 12:26
@Maddiaa0 Maddiaa0 marked this pull request as ready for review March 9, 2026 12:29
@Maddiaa0 Maddiaa0 force-pushed the md/update-epoch-cache-for-buildahead branch from ce0e422 to d8b8d31 Compare March 9, 2026 16:01
@Maddiaa0 Maddiaa0 force-pushed the md/pipelining-syncing branch 2 times, most recently from 3a47fbf to 4bb6845 Compare March 9, 2026 16:07
@Maddiaa0 Maddiaa0 force-pushed the md/update-epoch-cache-for-buildahead branch from d8b8d31 to 6973d24 Compare March 9, 2026 16:41
@Maddiaa0 Maddiaa0 force-pushed the md/pipelining-syncing branch from 4bb6845 to 0e9b52e Compare March 9, 2026 16:41
@Maddiaa0 Maddiaa0 force-pushed the md/pipelining-syncing branch from 0e9b52e to dee426a Compare March 12, 2026 15:52
@Maddiaa0 Maddiaa0 force-pushed the md/update-epoch-cache-for-buildahead branch from 6973d24 to 21cdcd5 Compare March 12, 2026 15:52
@Maddiaa0 Maddiaa0 force-pushed the md/update-epoch-cache-for-buildahead branch from 21cdcd5 to 9543a61 Compare March 16, 2026 18:08
@Maddiaa0 Maddiaa0 changed the base branch from md/update-epoch-cache-for-buildahead to graphite-base/21025 March 18, 2026 15:27
@Maddiaa0 Maddiaa0 force-pushed the graphite-base/21025 branch from 9bba2e5 to 4902133 Compare March 19, 2026 00:41
@Maddiaa0 Maddiaa0 force-pushed the md/pipelining-syncing branch from 2548aed to 4c620ca Compare March 19, 2026 00:41
@Maddiaa0 Maddiaa0 changed the base branch from graphite-base/21025 to merge-train/spartan March 19, 2026 00:41
}

/** Sets the pipelining tree-in-progress boundary for building ahead of L1 confirmation. */
public setPipeliningTreeInProgress(value: bigint): Promise<void> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was added to deal with #21494 - i did not want to directly set the other tree in progress, I think my current solution is nasty and would appreciate a discussion around it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the need for this, even after going through all its uses. The tree-in-progress is a property of the Inbox L1 contract, not of the L2 chain, it's goal being not trying to consume messages for a given checkpoint too early. If the check added in 21494 is throwing, it means we have a bigger problem and need to adjust the inbox lag.

debugLogStore,
);

// Register a callback for all nodes to set the pending checkpoint number when a checkpoint proposal is received.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds a lovely callback, the existing one was only relevant to nodes performing validation duties - allNodesCheckpointProposalHandler is triggered by everyone, not just validators

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today the validator-client is running a bunch of validations after it receives the proposal from libp2p service (they are not done in libp2p service mostly because they are slow). This callback means a node will push into their archiver potentially invalid checkpoint proposals, which is bad.

And yeah, I know it's confusing, since the method in libp2pservice is called processValidCheckpointProposal, even though the proposal is still missing a bunch of validations.

We had had this same issue with mbps, since now regular nodes needed to start following block proposals. So we extracted block validation logic to a block proposal handler, which is installed always regardless of the node being a validator or not. Maybe we should do the same here?

}

/** Merges multiple StateOverride arrays, combining stateDiff entries for the same address. */
public static mergeStateOverrides(...overrides: StateOverride[]): StateOverride {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates packing the fee slot

Whenever checkpoint blocks become very full, the fee will drift from block to block, this prevents simulating into the future failing

// Knowledege of pending checkpoints is in the PR above
const { targetSlot } = this.epochCache.getTargetAndNextSlot();
if (targetSlot <= this.lastSlotProcessed) {
let slot;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Maddiaa0 Maddiaa0 force-pushed the md/pipelining-syncing branch from 4c620ca to c2cc23c Compare March 19, 2026 00:55
@Maddiaa0 Maddiaa0 force-pushed the md/pipelining-syncing branch from c2cc23c to 366f809 Compare March 19, 2026 02:07
await store.addCheckpoints([checkpoint1]);

// Set pending checkpoint to 3 (far ahead)
await store.blockStore.setPendingCheckpoint({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it allow this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a good point, will restrict

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

await store.addCheckpoints([checkpoint1, checkpoint2, checkpoint3]);

// Set pending to 2 (already confirmed, but that's fine for the test)
await store.blockStore.setPendingCheckpoint({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be allowed either?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

lastArchive: block2.archive,
});

// Block for checkpoint 2 should work (previous confirmed = 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I misunderstand the flow. But shoudl we allow adding proposed blocks for multiple different checkpoints?

Unless we are already potentially supporting build ahead of multiple checkpoints?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been restricted to +1

block: { number: provenBlockNumber, hash: provenBlockData.blockHash.toString() },
checkpoint: provenCheckpointId,
},
pendingCheckpoint: pendingCheckpointBlockData
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we just call this pending?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to rethink naming. The rollup contract already has the concept of "pending checkpoint" which is actually the checkpointed checkpoint. We need to change either.

Other thoughts, in random order, some mutually exclusive:

  • Do we want to bundle the pending checkpoint with the proposed chain tip? It'll break the property that, in a chain tip, the reported block matches the last block of the reported checkpoint. But maybe it's fine.
  • Do we want to differentiate attested vs non-attested pending checkpoints? Personally I don't think so.
  • Should the pendingCheckpoint tip return the checkpointed one if there's no pending checkpoint block data? It's the only chain tip that may be undefined.
  • Do we want to make a bigger rename? It seems like we're dealing with "pending checkpoints" and "checkpointed checkpoints". Maybe "checkpoint" was the bad name, and we should be talking about "pending bundles" and "checkpointed bundles" or something like that?

No need to make these naming changes in this PR, but let's give it a good thought before closing this epic.

// The checkpoint proposal often arrives before the last block finishes re-execution.
// Trigger a sync to flush any queued blocks, then retry.
if (!blockData) {
await archiver.syncImmediate();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will probably need a retry loop here won't we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

this.allNodesCheckpointReceivedCallback = (
_checkpoint: CheckpointProposalCore,
): Promise<CheckpointAttestation[] | undefined> => {
return Promise.resolve(undefined);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably log an error here shouldn't we? Everyone should register a handler here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

this.validatorCheckpointReceivedCallback = (
checkpoint: CheckpointProposalCore,
): Promise<CheckpointAttestation[] | undefined> => {
this.logger.debug(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this log line could maybe go if this is validator only. Otherwise full nodes will print this routinely.

@Maddiaa0 Maddiaa0 force-pushed the md/pipelining-syncing branch 2 times, most recently from dcd68a6 to 2d31671 Compare March 19, 2026 13:58
const canProposeCheck = await publisher.canProposeAt(syncedTo.archive, proposer ?? EthAddress.ZERO, {
...invalidateCheckpoint,
});
// Determine the correct archive and L1 state overrides for the canProposeAt check.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block has been edited since last review @PhilWindle

}

// What's the slot of the first uncheckpointed block?
// Don't prune blocks that are covered by a pending checkpoint (awaiting L1 submission from pipelining)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly confused by this. What happens if a checkpoint fails to land on L1? Surely the blocks covered by that (pending) checkpoint are removed? As well as all blocks built afterwards?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, md/pipeline-recovery-2 im dealing with this currently in this branch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to it!

@Maddiaa0 Maddiaa0 force-pushed the md/pipelining-syncing branch from 2d31671 to 2591475 Compare March 19, 2026 21:03
// Trigger syncs to flush any queued blocks, retrying until we find the data or give up.
if (!blockData) {
blockData = await retryUntil(
async () => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably want to

  1. Not call syncImmediate here. That's going to force the archiver to query L1 aggressively. When the block is pushed to the archiver it already calls that function, se we aren't going to make things go any faster.
  2. Use a more appropriate timeout. Presumably that wuld be the end of the slot?

Copy link
Contributor

@spalladino spalladino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work man

Comment on lines +77 to +85
/** Storage format for a pending checkpoint (attested but not yet L1-confirmed). */
type PendingCheckpointStore = {
header: Buffer;
checkpointNumber: number;
startBlock: number;
blockCount: number;
totalManaUsed: string;
feeAssetPriceModifier: string;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we capture archive, outhash, or all data that's not L1 or attestations?

Also, nit: rename to PendingCheckpointStorage for consistency with the other types here.

Copy link
Member Author

@Maddiaa0 Maddiaa0 Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can, I just kept the minimum; will add

Comment on lines +201 to +213
// The same check as above but for checkpoints. Accept the block if either the confirmed
// checkpoint or the pending (locally validated but not yet confirmed) checkpoint matches.
const expectedCheckpointNumber = blockCheckpointNumber - 1;
if (
!opts.force &&
previousCheckpointNumber !== expectedCheckpointNumber &&
pendingCheckpointNumber !== expectedCheckpointNumber
) {
const [reported, source]: [CheckpointNumber, 'confirmed' | 'pending'] =
pendingCheckpointNumber > previousCheckpointNumber
? [pendingCheckpointNumber, 'pending']
: [previousCheckpointNumber, 'confirmed'];
throw new CheckpointNumberNotSequentialError(blockCheckpointNumber, reported, source);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any situation where addProposedBlock would add a block for the pending checkpoint? My understanding is we add proposed blocks, then throw a checkpoint proposal on top to flag those as "pending checkpointing", and then keep adding proposed blocks for the next one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, adding a block to a pending checkpoint breaks the blockCount property of the PendingCheckpointStore. Seems like we should not allow that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we should not end up adding directly to the pending checkpoint, only above it.

This case is to allow building ontop of the pending checkpoint - not for. But it looks like it may allow what you have mentioned, I'll make it more strict

block: { number: provenBlockNumber, hash: provenBlockData.blockHash.toString() },
checkpoint: provenCheckpointId,
},
pendingCheckpoint: pendingCheckpointBlockData
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to rethink naming. The rollup contract already has the concept of "pending checkpoint" which is actually the checkpointed checkpoint. We need to change either.

Other thoughts, in random order, some mutually exclusive:

  • Do we want to bundle the pending checkpoint with the proposed chain tip? It'll break the property that, in a chain tip, the reported block matches the last block of the reported checkpoint. But maybe it's fine.
  • Do we want to differentiate attested vs non-attested pending checkpoints? Personally I don't think so.
  • Should the pendingCheckpoint tip return the checkpointed one if there's no pending checkpoint block data? It's the only chain tip that may be undefined.
  • Do we want to make a bigger rename? It seems like we're dealing with "pending checkpoints" and "checkpointed checkpoints". Maybe "checkpoint" was the bad name, and we should be talking about "pending bundles" and "checkpointed bundles" or something like that?

No need to make these naming changes in this PR, but let's give it a good thought before closing this epic.

}

// What's the slot of the first uncheckpointed block?
// Don't prune blocks that are covered by a pending checkpoint (awaiting L1 submission from pipelining)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to it!

Comment on lines +1025 to +1036
const current = await this.getPendingCheckpointNumber();
if (pending.checkpointNumber <= current) {
this.#log.warn(`Ignoring stale pending checkpoint number ${pending.checkpointNumber} (current: ${current})`);
return;
}
const confirmed = await this.getLatestCheckpointNumber();
if (pending.checkpointNumber !== confirmed + 1) {
this.#log.warn(
`Ignoring pending checkpoint ${pending.checkpointNumber}: expected ${confirmed + 1} (confirmed + 1)`,
);
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we can think of legitimate situations for this, I'd throw instead of warning. It will help us catch inconsistencies easier.

if (this.interrupted) {
return undefined;
}
return this.sendRequests();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we re-simulate before sending, this time with the actual state of L1, instead of the manual overrides? This should help catch scenarios where the previous checkpoint didn't behave as we expected (not to mention the ones where it didn't land).

Copy link
Member Author

@Maddiaa0 Maddiaa0 Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#21250 in here i add a preCheck hook to each submission that runs after the enqueue sleep - without the overrides

Comment on lines +298 to +302
const grandparentCheckpointNumber = CheckpointNumber(this.checkpointNumber - 2);
const [grandparentCheckpoint, manaTarget] = await Promise.all([
rollup.getCheckpoint(grandparentCheckpointNumber),
rollup.getManaTarget(),
]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this works because we're guaranteed to have a grandparent checkpoint if there's a non-undefined pendingCheckpointData? Otherwise I see this failing in the first checkpoint(s)?

Comment on lines +75 to +76
/** The last checkpoint proposal job, tracked so we can await its pending L1 submission during shutdown. */
private lastCheckpointProposalJob: CheckpointProposalJob | undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid question: why do we want to await it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was just thinking it has already been validated, and youll get rewards may aswell wait to get em. dont know if theres a clear need, just thought it was nice to have.

Comment on lines +373 to +389
if (invalidateCheckpoint) {
// After invalidation, L1 will roll back to checkpoint N-1. The archive at N-1 already
// exists on L1, so we just pass the matching archive (the lastArchive of the invalid checkpoint).
archiveForCheck = invalidateCheckpoint.lastArchive;
l1Overrides.forcePendingCheckpointNumber = invalidateCheckpoint.forcePendingCheckpointNumber;
this.metrics.recordPipelineDepth(0);
} else if (this.epochCache.isProposerPipeliningEnabled() && syncedTo.hasPendingCheckpoint) {
// Parent checkpoint hasn't landed on L1 yet. Override both the pending checkpoint number
// and the archive at that checkpoint so L1 simulation sees the correct chain tip.
const parentCheckpointNumber = CheckpointNumber(checkpointNumber - 1);
l1Overrides.forcePendingCheckpointNumber = parentCheckpointNumber;
l1Overrides.forceArchive = { checkpointNumber: parentCheckpointNumber, archive: syncedTo.archive };
this.metrics.recordPipelineDepth(1);

this.log.verbose(
`Building on top of pending checkpoint (pending=${syncedTo.pendingCheckpointData?.checkpointNumber})`,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the order of this ifs be switched? If there's a pending checkpoint AND the tip of the checkpointed chain is invalid, we should expect the pending checkpoint to perform the invalidation when it lands, so we should build on top of the pending checkpoint instead.

* Note: this checks against the checkpointed chain (L1-confirmed state), not the proposed chain.
*/
protected async checkSync(args: { ts: bigint; slot: SlotNumber }): Promise<SequencerSyncCheckResult | undefined> {
// Check that the archiver has fully synced the L2 slot before the one we want to propose in.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update this comment

Base automatically changed from merge-train/spartan to next March 20, 2026 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants